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PERSPECTIVE 

Convergent functional genomics of stem cell-derived cells 

AB Niculescu 

Stem cell technologies provide an exciting avenue to directly access the transcriptome of patients in neuronal-like cell types, 
which might have more direct relevance to brain research than other peripheral tissues (blood, fibroblasts). Enthusiasm should be 
tempered by concerns that artifacts and noise might be generated as part of the in vitro process of creating and maintaining these 
cell type. A solution may be to apply a Convergent Functional Genomics approach, where the data from stem cell-derived neuronal 
cells are integrated, cross-validated and prioritized using independent lines of evidence from other approaches and platforms 
(human genetic data, human postmortem brain data, animal model data). I provide a brief overview and an example in support of 
such an approach. 
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'Did I request thee, Maker, from my clay 
To mould me man? Did I solicit thee 
From darkness to promote me?' 

— Milton, Paradise Lost 



GENE EXPRESSION AND STEM CELL-DERIVED CELLS 

Gene expression data directly reflect genetic inheritance and 
acquired genetic mutations, as well as environmental influences. 
Scientifically, it may also help tie together and unravel epistasis 
(co-acting gene expression, 1 'genes that change together work 
together'), as well as regulatory networks of non-coding single- 
nucleotide polymorphisms (SNPs), 2 epigenetic changes, chromatin 
modifications, non-coding RNAs and transcription factors 
responsive to environmental stimuli. Human gene expression 
studies have been carried out in postmortem brain, 3,4 as well as in 
peripheral blood, 5,6 fibroblasts, 7 olfactory epithelium-derived 
neurons 8 and, more recently, in induced pluripotent stem cell 
(iPSC)-derived neurons. 9,10 Each particular approach has strengths 
and limitations (Table 1). The quest for peripheral tissue read-outs 
and biomarkers is particularly important in psychiatry, as the 
target organ (brain) is not accessible to biopsies in live humans, for 
obvious practical and ethical reasons. The integration of genomics 
with phenomics (for example, quantitative clinical data), in 
particular the issue of whether a marker reflects state, trait, both, 
or neither, is important and often overlooked. The ability to 
correlate peripheral read-outs directly with mental states (for 
example, symptom severity), or indirectly with mental traits (for 
example, psychiatric diagnosis), determines what kind of biomar- 
kers can be discovered using different tissues and approaches. 

iPSCs, in addition to future hypothetical organ-building 
regenerative medicine applications, may be more immediately 
useful for understanding disease, 11 and particularly for drug 
testing and drug discovery, 12 including personalized medicine 
approaches. However, concerns arise about genetic and gene 
expression artifacts induced by the in vitro stem cell creation 
process. Two of the four transcription factors used to create iPSCs 



(c-Myc and KLF4) are oncogenic. Interestingly, histone deacetylase 
inhibitors (HDACi), such as the neuropsychiatric agent valproate, 13 
may provide a safer alternative for helping transform adult cells 
into iPSCs. Valproate might also expand the pool of neural stem 
cells in the adult brain. 14 The effect of the HDACi per se on the 
gene expression landscape would have to be factored out in 
scientific studies. Olfactory epithelium-derived neuronal precursor 
cells may also have less transformation artifacts, 8,15 although the 
cell culture and passaging artifacts remain in common with the 
other cell culture approaches. Finally, all neuronal-like cells derived 
with these methodologies need to be validated as being indeed 
reflective of true neurons. Some of the methods used for this are, 
in the increasing order of relevance, neuronal biochemical marker 
testing (immunohistochemistry), testing for synapse formation 
(electrophysiology) and functional integration in vivo} 6 

The gene expression data obtained from such cells arguably 
need additional cross-validation for relevance to in vivo function- 
ing and disease states. 



CONVERGENT FUNCTIONAL GENOMICS (CFG) 

Genetic and gene expression studies in humans and lower 
organism model (mice, rats, dogs, zebra fish, Drosophila, Caenor- 
habditis elegans, yeast) studies of medical disorders are becoming 
increasingly integrated. Particularly for genomics, the convergence 
and integration of data across experimental modalities, technical 
platforms, and species are providing a fit-to-disease way of 
extracting reproducible and biologically important signal, in 
contrast to the fit-to-cohort effect and limited reproducibility of 
human genetic analyses alone. Due to the emerging data from the 
ENCODE project suggesting that a major portion of the non-coding 
genome may contain regulatory variants, convergent approaches 
are going to be important to identify disease-relevant signal from 
the polymorphic variation present in the population. 

CFG 1,5,17-24 is a powerful methodology developed over the past 
15 years for extracting signal from noise by gene-level integration 
of multiple independent lines of evidence from human and 
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Table 1. Human neuropsychiatric gene expression studies 




Human postmortem 
brain 


Human blood 


Fibroblasts 


Olfactory epithelium- 
derived neurons 


iPSC-derived neurons 


Strengths 


Target organ 
Neuronal cells 


Immediate access 
Non-transformed cells (except 
when lymphoblastoid cell 
lines are used) 

Direct correlation with mental 
state information 


Non- (or less) 
transformed cells 


Neuronal-like cells 
Non- (or less) 
transformed cells 


Neuronal-like cells 


Limitations 


Postmortem interval 
artifacts 
Lack of direct 
correlation with mental 
state information 


Non-neuronal 


Cell-culture artifacts 
Non-neuronal 
Lack of direct 
correlation with mental 
state information 


Cell-culture artifacts 
Lack of direct 
correlation with mental 
state information 


Cell-culture artifacts 
Transformation 
artifacts 
Lack of direct 
correlation with mental 
state information 


Abbreviation: iPSC, induced pluripotent stem cell. 



Convergent Functional Genomics 
Multiple Independent Lines of Evidence For Integration and 
Prioritization of iPS-derived Neuronal Cells Data 



Animal Model Studies 



Animal Model Genetic 
Evidence (Transgenic or QTL) 
(1 Pt) 




Other Human Studies 



Animal Model Brain 
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Evidence (1 pt) 
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Figure 1. Convergent Functional Genomics: multiple independent lines of evidence for integration and prioritization of induced pluripotent 
stem (iPS)-derived neuronal cells data. CNV, copy number variant; QTL, quantitative trait loci. 



lower organisms model studies — genetic, gene expression, 
proteomics — of brain, peripheral tissues and cell lines (Figure 1). 
Lower organism model data can provide sensitivity and ability to 
conduct experimental manipulations not feasible in humans. 
Human data provide more specificity and relevance to the human 
disease. Combined, we have an approach that increases our 
ability to distinguish signal from noise even with limited size 
cohorts and data sets. CFG helps to identify and prioritize 
candidate genes for the illness, using a polyevidence score. 
All these lines of evidence are the result of independent 
experiments. The virtues of this networked approach are that, 
even if one or another of the 'nodes' (lines of evidence) becomes 
questionable/non-functional upon further evidence in the field, 
the network is resilient and maintains the functionality. The 
prioritization of candidates is similar conceptually to the Google 
PageRank algorithm — the more links (lines of evidence) to a 
candidate, the higher it will be prioritized. Subsequent biological 
pathway analyses on these prioritized genes can uncover 
mechanistic aspects of the disease being studied. More recently, 
variations and expansions of this approach have been used 
successfully by other groups as well. 25,26 



Our past work provides evidence for the advantages, reprodu- 
cibility and consistency of gene-level analyses of data, as opposed 
to SNP level analyses, pointing to the fundamental issue of 
genetic heterogeneity at a SNP level 27 In fact, it may be that the 
more biologically important a gene is for higher mental functions, 
the more heterogeneity it has at a SNP level and the more 
evolutionary divergence, for adaptive reasons 28 A similar diversity, 
for similar adaptive reasons, exists in immune system genes. 

On top of the gene-level integration, CFG provides a way to 
prioritize genes based on disease relevance, not study-specific 
effects (that is, fit-to-disease as opposed to fit-to-cohort). 
Reproducibility of findings across different studies, experimental 
paradigms and technical platforms is deemed more important (and 
scored as such by CFG) than the strength of finding in an individual 
study (for example, P-value in a genome-wide association study 
(GWAS)). This Bayesian-like approach minimizes false positives if 
one focuses on the top of the distribution, and minimizes false 
negatives if one goes deeper down the list (Figure 2). Most 
importantly, the CFG-prioritized genes show reproducibility and 
predictive ability in independent cohorts, which is the key litmus 
test for genetic and biomarker studies. Once the genes are 
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Top Candidate Genes for 




Figure 2. Top candidate genes for schizophrenia — Convergent Functional Genomics (CFG) analysis of ISC genome-wide association study 
(GWAS). iPS cell, induced pluripotent stem cell; ISC, International Schizophrenia Consortium. 



identified and prioritized, biological pathway analyses can be 
conducted and mechanistic models can be constructed. 

Using a set of mouse experiments as a driving force, 20,23 or 
using human blood gene expression 5,6 or GWAS data 22,24,27 as a 
driving force, such convergent studies from my group and others 
have identified and prioritized candidate genes and biomarkers 
for psychiatric disorders (bipolar disorder, 22,24 schizophrenia, 21,27 
anxiety disorders, 29 alcoholism 19,30 ) that show good repro- 
ducibility as well as predictive ability in independent cohorts. In 
essence, the CFG approach is a de facto field-wide collaboration, 
integrating together the best available evidence at the time the 
analyses are conducted. Periodic re-analyses as future evidence 
accumulates in the field can improve and refine the results. 

APPLICATION OF CFG TO STEM CELL-DERIVED DATA 

Data generated from neuronal-like cells derived from iPSCs can be 
cross-validated and prioritized using a CFG approach with other 
lines of evidence (Figure 1), or can serve as a line of evidence itself 
for the cross-validation and prioritization of, for example, GWAS 
data (Figure 2). 

We have used the later approach for schizophrenia 27 Data 
published by Gage and colleagues from schizophrenia subjects 10 
in iPSC-derived neuronal-like cells ('hiPSC neurons') was used as 
one of the multiple lines of evidence in a convergent approach 
that incorporated, besides GWAS data, 31 human postmortem data, 
human blood gene expression data 6 and animal model 
pharmacogenomics brain and blood gene expression data 
(using phencyclidine and clozapine as agonist-antagonist 
pharmacological agents 21 ). In all, 21% (9 out of 42) of the top 
schizophrenia candidate genes identified by us in our overall CFG 
analysis had evidence in the hiPSC neurons study, and in 6 out of 
9 of these genes the direction of change in expression in iPS- 
derived cells was the same as that in postmortem brains from 
schizophrenics (HSPA1B, TCF4, CD9, KALRN, PRKCA and NRG1) 
(Figure 2). Given the fact that the tiiPSC neurons' data in the 
original study were derived from only n = 4 schizophrenic 
subjects, 10 and there is intra-subject as well as inter-subject 



variability in cell lines, generating a large ( 596 unique genes) and 
potentially noisy list of differentially expressed genes, the use of 
cross-validating approaches such as CFG was essential to pinpoint 
the most disease-relevant genes. 

The case of HSPA1 B (heat-shock 70-kDa protein 1 B), for example, a 
previously more obscure gene in terms of involvement in 
schizophrenia, is illustrative of the utility of a non-hypothesis-driven, 
convergent approach. HSPA1B, a chaperone involved in stress 
response, stabilizes existing proteins against aggregation and 
mediates the folding of newly translated proteins. HSPA1B has 
some previous genetic evidence for association with schizophrenia 32 
It is co-directionally increased in expression in postmortem brains 33 
and iPSC-derived neurons from schizophrenia patients. HSPA1B is 
also decreased in expression by antipsychotic treatment with 
clozapine in the brain and blood of a mouse model, based on our 
previous work. 21 It was also co-directionally increased in the brain 
and blood in a pharmacogenomic mouse model of anxiety disorders 
that we have recently described, 29 as well as in a stress-reactive 
genetic mouse model 20 Treatment with the omega-3 fatty acid 
docosahexaenoic acid reversed the increase in expression of HSPA1 B 
in this stress-reactive genetic mouse model 30 Another closely related 
molecule, HSPA1 A (heat-shock 70-kDa protein 1 A), is also present on 
our list of prioritized candidate genes for schizophrenia, with a lower 
CFG score of 3.5 27 Heat-shock proteins may be involved in the 
biological and clinical overlap and interdependence between 
response to stress, 34 anxiety and psychosis. 

A CFG approach could also be used in cases where HDACi are 
used for transformation, to understand which gene expressed in 
iPSC-derived cells are drug modulated. We have generated in our 
lab valproate brain and blood gene expression data sets 5,23 from 
mouse models, which could serve such a role. 



CONCLUSION 

Convergent approaches may be important for mining and 
interpreting gene expression data from pluripotent stem cell- 
derived cells in psychiatric and non-psychiatric disorders. 
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